Logistic Regression with Structured Sparsity

نویسندگان

  • Nikhil S. Rao
  • Robert D. Nowak
  • Christopher R. Cox
  • Timothy T. Rogers
چکیده

Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. This is sometimes referred to as a “sparse group” lasso procedure, and it allows for more flexibility than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlapping groups, an additional flexiblity that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization program that automatically selects similar features for learning in high dimensions. We establish consistency results for the SOSlasso for classification problems using the logistic regression setting, which specializes to results for the lasso and the group lasso, some known and some new. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simple Training of Dependency Parsers via Structured Boosting

Recently, significant progress has been made on learning structured predictors via coordinated training algorithms such as conditional random fields and maximum margin Markov networks. Unfortunately, these techniques are based on specialized training algorithms, are complex to implement, and expensive to run. We present a much simpler approach to training structured predictors by applying a boo...

متن کامل

Lazy Sparse Stochastic Gradient Descent for Regularized Mutlinomial Logistic Regression

Stochastic gradient descent efficiently estimates maximum likelihood logistic regression coefficients from sparse input data. Regularization with respect to a prior coefficient distribution destroys the sparsity of the gradient evaluated at a single example. Sparsity is restored by lazily shrinking a coefficient along the cumulative gradient of the prior just before the coefficient is needed. 1...

متن کامل

Fast Algorithms for Structured Sparsity

Sparsity has become an important tool in many mathematical sciences such as statistics, machine learning, and signal processing. While sparsity is a good model for data in many applications, data often has additional structure that goes beyond the notion of “standard” sparsity. In many cases, we can represent this additional information in a structured sparsity model. Recent research has shown ...

متن کامل

Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization

We consider the problem of learning a sparse regression model for predicting multiple related outputs given high-dimensional inputs, where related outputs are likely to share common relevant inputs. Most of the previous methods for learning structured sparsity assumed that the structure over the outputs is known a priori, and focused on designing regularization functions that encourage structur...

متن کامل

Solving Logistic Regression with Group Cardinality Constraints for Time Series Analysis

We propose an algorithm to distinguish 3D+t images of healthy from diseased subjects by solving logistic regression based on cardinality constrained, group sparsity. This method reduces the risk of overfitting by providing an elegant solution to identifying anatomical regions most impacted by disease. It also ensures that consistent identification across the time series by grouping each image f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1402.4512  شماره 

صفحات  -

تاریخ انتشار 2014